Total Expected Discounted Reward MDPs: Existence of Optimal Policies
نویسنده
چکیده
This article describes the results on the existence of optimal and nearly optimal policies for Markov Decision Processes (MDPs) with total expected discounted rewards. The problem of optimization of total expected discounted rewards for MDPs is also known under the name of discounted dynamic programming.
منابع مشابه
Reduction of Discounted Continuous-Time MDPs with Unbounded Jump and Reward Rates to Discrete-Time Total-Reward MDPs
This article discusses a reduction of discounted Continuous-Time Markov Decision Processes (CTMDPs) to discrete-time Markov Decision Processes (MDPs). This reduction is based on the equivalence of a randomized policy that chooses actions only at jump epochs to a nonrandomized policy that can switch actions between jumps. For discounted CTMDPs with bounded jump rates, this reduction was introduc...
متن کاملOn the Reduction of Total-Cost and Average-Cost MDPs to Discounted MDPs
This paper provides conditions under which total-cost and average-cost Markov decision processes (MDPs) can be reduced to discounted ones. Results are given for transient total-cost MDPs with transition rates whose values may be greater than one, as well as for average-cost MDPs with transition probabilities satisfying the condition that there is a state such that the expected time to reach it ...
متن کامل2 Finite State and Action Mdps
In this chapter we study Markov decision processes (MDPs) with nite state and action spaces. This is the classical theory developed since the end of the fties. We consider nite and in nite horizon models. For the nite horizon model the utility function of the total expected reward is commonly used. For the in nite horizon the utility function is less obvious. We consider several criteria: total...
متن کاملHeuristic Search for Generalized Stochastic Shortest Path MDPs
Research in efficient methods for solving infinite-horizon MDPs has so far concentrated primarily on discounted MDPs and the more general stochastic shortest path problems (SSPs). These are MDPs with 1) an optimal value function V ∗ that is the unique solution of Bellman equation and 2) optimal policies that are the greedy policies w.r.t. V ∗. This paper’s main contribution is the description o...
متن کامل6 Total Reward Criteria
This chapter deals with total reward criteria. We discuss the existence and structure of optimal and nearly optimal policies and the convergence of value iteration algorithms under the so-called General Convergence Condition. This condition assumes that, for any initial state and for any policy, the expected sum of positive parts of rewards is nite. Positive, negative, and discounted dynamic pr...
متن کامل